Evaluating Concrete Strength Model Performance

Using Cross-validation Methods

Sai Devarashetty, Mattick, Musson, Perez

2024-07-30

Introduction To Cross-validation

  • Measure performance and generalizability of predictive models.
  • Compare different models constructed from the same data set.

CV widely used in various fields including:

  • Machine Learning
  • Data Mining
  • Bioinformatics
  • Minimize overfitting
  • Ensure a model generalizes to unseen data
  • Tune hyperparameters

Definitions

Generalizability:
How well predictive models created from a sample fit other samples from the same population.

Overfitting:
When a model fits the the underlying patterns of the training data too well.

Model fits characteristics specific to the training set:

  • Noise
  • Random fluctuations
  • Outliers

Hyperparameters:
Are model configuration variables

Nodes and layers in a neural network

Branches in a decision tree

Process

Subsets the data into k approximately equally sized folds

  • Randomly
  • Without replacement

(Song, Tang, and Wee 2021)

Split The Subsets into test and training sets

  • 1 test set
  • k-1 training set

  • Fit the model to the training set
  • Apply the fitted model to the test set
  • Measure the prediction error

Repeat k Times

  • Train with all k-1 combinations
  • Test with each subset 1 time

Calculate the mean error

Bias-Variance Trade-Off

 

k-Fold vs. LOOCV
Method Computation Bias Variance
k-Fold Lower Intermediate Lower
LOOCV Highest Unbiased High

k-fold where k = 5 or k = 10 is recommended:

  • Lowe computational cost
  • Does not show excessive bias
  • Does not show excessive variance

(James et al. 2013), (Gorriz et al. 2024)

 

Model Measures of Error (MOE)

  • Measure the quality of fit of a model
  • Measuring error is a critical data modeling step
  • Different MOE for different data types

By measuring the quality of fit we can select the model that Generalizes best.

\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{f}(x_i)| \tag{1} \]

  • A measure of error magnitude
  • The sign does not matter - absolute value
  • Lower magnitude indicates better fit
  • Take the mean absolute difference between:
    • observed \((y_i)\) and the predicted \(\hat{f}(x_i)\) values
  • \(n\) is the number of observations,
  • \(\hat{f}(x_i)\) is the model prediction \(\hat{f}\) for the ith observation
  • \(y_i\) is the observed value

\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{f}(x_i))^2} \tag{2} \]

  • A measure of error magnitude
  • Lower magnitude indicates better fit
  • Error is weighted
    • Squaring the error give more weight to the larger ones
    • Taking the square root returns the error to the same units as the response variable

\[ \text{R}^2 = \frac{SS_{tot}-SS_{res}}{SS_{tot}} = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{f}(x_i))^2}{\sum_{i=1}^{n}(y_i-\bar{f}(x_i))^2} \tag{3} \]

  • Proportion of the variance explained by the predictor(s)
  • Higher value means better the fit
    • An \(R^2\) value of 0.75 indicates 75% of the variance in the response variable is explained by the predictor(s)

(James et al. 2013), (Hawkins, Basak, and Mills 2003), (Helsel and Hirsch 1993)

k-Fold Cross-Validation

\[ CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k} \text{Measuer of Errori}_i \tag{4} \]

(James et al. 2013),(Browne 2000)

Leave One Out Cross-validations (LOOCV)

\[ CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n} \text{Measuer of Errori}_i \tag{5} \]

(James et al. 2013),(Browne 2000)

Nested Cross-Validation

(Berrar et al. 2019)

Study Data

Yeh modeled compression strength of high performance concrete (HPC) at various ages and made with different ratios of components.(I-C Yeh 1998)

The data is available for downloaded at UCI Machine Learning Repository.
UCI Repository HPC Data

(I-Cheng Yeh 2007).

Data Exploration and Visualization

  • Target variable:
    • Strength (MPa)
  • Predictor variables:
    • Cement (kg/m3)
    • Superplasticizer (kg/m3)
    • Age (days)
    • Water (kg/m3)

All variables are quantitative

Linear Regression Model

Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2578655 5.1878634 5.446918 1.0e-07
Cement 0.0668433 0.0039668 16.850539 0.0e+00
Superplasticizer 0.8716897 0.0903825 9.644449 0.0e+00
Age 0.1110466 0.0069538 15.969235 0.0e+00
Water -0.1195600 0.0257210 -4.648334 3.9e-06

\[ \hat{Strength} = 28.258_\text{Cement + } 0.067_\text{Superplasticizer + } 0.872_\text{Age } 0.111_\text{Water} \]

Linear Regression CV Results

  • k-Fold Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • LOOCV Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • Nested CV Results:
Measure of Error Result
RMSE 11.87
MAE 9.43
R2 0.49

LightGBM Model

 

Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

 


  • Ensemble of decision trees
  • Uses gradient boosting
  • Final prediction is the sum of predictions from all individual trees
  • Feature importance

LightGBM CV Results

  • k-Fold Results:
Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

  • LOOCV Results:
Measure of Error Result
RMSE 5.93
MAE 4.32
R2 0.87

  • Nested CV Results:
Measure of Error Result
RMSE 8.27
MAE 6.39
R2 0.75

Comparison of Models

  • Performance Comparison:
      Linear Regression vs. LightGBM
  • Advantages and disadvantages
     of each model
Method Measure of Error Linear Regression LightGBM
5-Fold RMSE 12.13 8.73
5-Fold MAE 9.23 6.82
5-Fold R2 0.46 0.73
LOOCV RMSE 12.13 5.93
LOOCV MAE 9.23 4.32
LOOCV R2 0.46 0.87
NCV RMSE 11.87 8.27
NCV MAE 9.43 6.39
NCV R2 0.49 0.75

Model Comparison k-Fold Plot

Model Comparison LOOCV Plot

Model Comparison Nested CV Plot

Conclusion: Overview

  • Evaluation of Two Models:
    • Linear Regression Model
    • LightGBM Model

  • Cross-Validation Methods Used:
    • k-fold Cross-Validation
    • Leave-One-Out Cross-Validation (LOOCV)
    • Nested Cross-Validation

Conclusion: Key Findings

  • Model Performance:
    • LightGBM consistently outperformed Linear Regression
    • Linear Regression provided baseline insights into linear relationships

  • Cross-Validation Insights:
    • k-fold CV showed LightGBM’s superior generalization
    • LOOCV confirmed robustness across individual data points
    • Nested CV mitigated overfitting, ensuring genuine predictive power

Conclusion: Implications and Future Directions

  • Implications for Future Research:
    • Importance of advanced cross-validation techniques
    • Enhancing model validation processes
    • Ensuring model generalizability and reliability across various applications

  • Future Directions:
    • Continuous refinement of cross-validation methods
    • Exploration of implications in different predictive modeling scenarios
    • Development of robust predictive models through improved validation processes

References

All figures were created by the authors

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2024. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Bates, Douglas, Martin Maechler, and Mikael Jagan. 2024. Matrix: Sparse and Dense Matrix Classes and Methods. https://Matrix.R-forge.R-project.org.
Berrar, Daniel et al. 2019. “Cross-Validation.”
Browne, Michael W. 2000. “Cross-Validation Methods.” Journal of Mathematical Psychology 44 (1): 108–32.
Gorriz, Juan M, Fermı́n Segovia, Javier Ramirez, Andrés Ortiz, and John Suckling. 2024. “Is k-Fold Cross Validation the Best Model Selection Method for Machine Learning?” arXiv Preprint arXiv:2401.16407.
Hamner, Ben, and Michael Frasco. 2018. Metrics: Evaluation Metrics for Machine Learning. https://github.com/mfrasco/Metrics.
Hawkins, Douglas M, Subhash C Basak, and Denise Mills. 2003. “Assessing Model Fit by Cross-Validation.” Journal of Chemical Information and Computer Sciences 43 (2): 579–86.
Helsel, Dennis R, and Robert M Hirsch. 1993. Statistical Methods in Water Resources. Elsevier.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.
Kuhn, Max. 2023. Caret: Classification and Regression Training. https://github.com/topepo/caret/.
Leisch, Friedrich, and Evgenia Dimitriadou. 2024. Mlbench: Machine Learning Benchmark Problems. https://CRAN.R-project.org/package=mlbench.
R Core Team. 2024a. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
———. 2024b. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Shi, Yu, Guolin Ke, Damien Soukhavong, James Lamb, Qi Meng, Thomas Finley, Taifeng Wang, et al. 2024. Lightgbm: Light Gradient Boosting Machine. https://github.com/Microsoft/LightGBM.
Song, Q Chelsea, Chen Tang, and Serena Wee. 2021. “Making Sense of Model Generalizability: A Tutorial on Cross-Validation in r and Shiny.” Advances in Methods and Practices in Psychological Science 4 (1): 2515245920947067.
Wei, Taiyun, and Viliam Simko. 2021. Corrplot: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2023. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and Teun van den Brand. 2024. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://readr.tidyverse.org.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2024. Tidyr: Tidy Messy Data. https://tidyr.tidyverse.org.
Xie, Yihui. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Yeh, I-C. 1998. “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete Research 28 (12): 1797–1808.
Yeh, I-Cheng. 2007. Concrete Compressive Strength.” UCI Machine Learning Repository.